Optimal Data-Based Binning for Histograms
نویسنده
چکیده
Histograms are convenient non-parametric density estimators, which continue to be used ubiquitously. Summary quantities estimated from histogram-based probability density models depend on the choice of the number of bins. In this paper we introduce a straightforward data-based method of determining the optimal number of bins in a uniform bin-width histogram. Using the Bayesian framework, we derive the posterior probability for the number of bins in the density model given the data. The most probable solution is determined naturally by a balance between the likelihood function, which increases with increasing number of bins, and the prior probability of the model, which decreases with increasing number of bins. We demonstrate how these results outperform several well-accepted rules for choosing bin sizes even in the integrated square error sense. Last, we demonstrate that these results can be applied directly to multi-dimensional histograms.
منابع مشابه
Adaptive Binning and Dissimilarity Measure for Image Retrieval and Classification
Color histogram is an important part of content-based image retrieval systems. It is a common understanding that histograms that adapt to images can represent their color distributions more efficiently than histograms with fixed binnings. However, among existing dissimilarity measures, only the Earth Mover’s Distance can compare histograms with different binnings. This paper presents a detailed...
متن کاملExperiments in Binning Image Statistics
Various vision tasks require the computation of image statistics and aggregating them into histograms. These histograms are usually compared using the χ distance which gives a rough idea of the similarity of the two image patches. For example, in [2] a histogram of key-points is collected on a rectangular grid for the category recognition task. Other researchers, [3], have used local image stat...
متن کاملAdaptive histograms and dissimilarity measure for texture retrieval and classification
Histogram-based dissimilarity measures are extensively used for content-based image retrieval. In an earlier paper [1], we proposed an efficient weighted correlation dissimilarity measure for adaptive-binning color histograms. Compared to existing fixed-binning histograms and dissimilarity measures, adaptive histograms together with weighted correlation produce the best overall performance in t...
متن کاملThe analysis and applications of adaptive-binning color histograms
Histograms are commonly used in content-based image retrieval systems to represent the distributions of colors in images. It is a common understanding that histograms that adapt to images can represent their color distributions more efficiently than do histograms with fixed binnings. However, existing systems almost exclusively adopt fixed-binning histograms because, among existing well-known d...
متن کاملMarkov Chain Driven Multi-Dimensional Visual Pattern Analysis with Parallel Coordinates
Parallel coordinates is a widely used visualization technique for presenting, analyzing and exploring multidimensional data. However, like many other visualizations, it can suffer from an overplotting problem when rendering large data sets. Until now, quite a few methods are proposed to discover and illustrate the major data trends in cluttered parallel coordinates. Among them, frequency-based ...
متن کامل